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Abstract 

Calibration methods have been widely studied in survey sampling over the last 
decades. Viewing calibration as an inverse problem, we extend the calibration tech- 
nique by using a maximum entropy method. Finding the optimal weights is achieved 
by considering random weights and looking for a discrete distribution which maxi- 
mizes an entropy under the calibration constraint. This method points a new frame 
for the computation of such estimates and the investigation of its statistical prop- 
erties. 
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Introduction 

Calibration is a well spread method to improve estimation in survey sampling, using 
extra information from an auxiliary variable. This method provides approximately unbi- 
ased estimators with variance smaller than that of the usual Horvitz-Thompson estimator 
(see for example [l5j). Calibration has been introduced by Deville and Sarndal in [2], ex- 
tending an idea of [3]. For general references, we refer to [20], [19] and for an extension 
to variance estimation to |17j . 

Finding the solution to a calibration equation involves minimizing an energy under some 
constraint. More precisely, let s be a random sample of size n drawn from a population U 
of size iV, y is the variable of interest and a; is a given auxiliary variable, for which the mean 
tx over the population is known. Further, let d G M" be the standard sampling weights 
(that is the Horvitz-Thompson ones). Calibration derives an estimator ty = N''^ XliGs 
of the population mean ty of y. The weights Wi are chosen to minimize a dissimilarity (or 
distance) V{.,d) on with respect to the Horvitz-Thompson weights di and under the 
constraint 
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Following we will view here calibration as a linear inverse problem. In this paper, 
we use Maximum Entropy Method on the Mean (MEM) to build the calibration weights. 
Indeed, MEM is a strong machinery for solving linear inverse problems. It tackles a 
linear inverse problem by finding a measure maximizing an entropy under some suitable 
constraint. It has been extensively studied and used in many applications, see for example 

pp, 0, 0, m, i, 0or p. 

Let us roughly explain how MEM works in our context. First we fix a prior probability 
measure u on M" with mean value equal to d. Then, the idea is to modify the weights in the 
sample mean in order to get a representative sample for the auxiliary variable x, but still 
being as close as possible to d, which have the desirable property of yielding an unbiased 
estimate for the mean. So, we will look for a j>ostenor probability measure minimizing the 
entropy (or KuUback information) with respect to u and satisfying a constraint related to 
([T]). It appears that the MEM estimator is in fact a specific calibration estimator for which 
the corresponding dissimilarity V{.,d) is determined by the choice of the prior distribution 
u. Hence, the MEM methodology provides a general Bayesian frame to fully understand 
calibration procedures in survey sampling where the different choices of dissimilarities 
appear as different choices of prior distributions. 

An important problem when studying calibration methods is to understand the amount of 
information contained in the auxiliary variable. Indeed, it appears that the relationships 
between the variable to be estimated and the auxiliary variable are crucial to improve 
estimation (see for example [13] or [20] )• When complete auxiliary information is 
available, increasing the correlation between the variables is made possible by replacing 
the auxiliary variable x by some function of it, say u{x). So, we consider efficiency issues 
for a collection of calibration estimators, depending on both the choice of the auxiliary 
variable and the dissimilarity. Finally, we provide an optimal way of building an efficient 
estimator using the MEM methodology. 

The article falls into the following parts. The first section recalls the calibration 
method in survey sampling, while the second exposes the MEM methodology in a general 
framework, and its application to calibration and instrument estimation. Section [3] is 
devoted to the choice of a data driven calibration constraint in order to build an efficient 
calibration estimator. It is shown to be optimal under strong asymptotic assumptions on 
the sampling design. Simulations illustrate previous results in Section H] while the proofs 
are postponed to Section O 

1 Calibration Estimation of a linear parameter 

Consider a large population U = {1,...,A^} and an unknown characteristic y = 
{yi, ...,yN) C M^. Our aim is to estimate its mean ty := A^~^ Tlii^uVi when only a ran- 
dom subsample s of the whole population is available. So the observed data are (?/i)ies- 
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The sampling design is the probabihty distribution p defined for each subset s C f/ as 
the probabihty p{s) that s is observed. We assume that vTj := p{i E s) = J2s iesP(^) 
strictly positive for all i E U, so di = l/iTi is well defined. A standard estimator of ty is 
given by the Horvitz-Thompson estimator: 

This estimator is unbiased and is widely used for practical cases, see for instance [3] for 
a complete survey. 

Suppose that it exists an auxiliary vector variable x = {xi, ...,xn), that is entirely 
observed and set = ^^'^Ylieu^i ^ Horvitz-Thompson estimator of t^, 

t^^ = Ylii^s^i^i is from the true value tx, it may imply that the sample does 
not describe well the behavior of the variable of interest in the total population. So, to 
prevent biased estimation due to bad sample selection, inference on the sample can be 
achieved by considering a modification of the weights of the individuals chosen in the 
sample. 

One of the main methodology used to correct this effect is the calibration method, 
(see [2]). The had sample effect is corrected by deriving new weights for the sample 
mean, but still being close to the dis to get a small bias. For this, consider a class of 
weighted estimators A^~^ X]ie(7 where the weights w = (Wt)i6s are selected to be close 
to d = {di)i<zs under the calibration constraint 

N'^^^WiXi = tx. 

There are two basic components in the construction of calibration estimators, namely 
a dissimilarity and a set of calibration equations. Let w ^ V{w,d) be a dissimilarity 
between some weights and the Horvitz-Thompson ones. Assume that this dissimilarity is 
minimal for Wi = di. The method consists in choosing weights minimizing V{.,d) under 
the constraint A^~^ Sies""^*^* ~ 

A typical dissimilarity is the distance w X]jes(^*^* ~ i.li'^i) ^r {qi)ies a posi- 
tive smoothing sequence (see [2]). So the new estimator is defined as iy = Xlies "^iVi^ 
where the weights Wi minimizes V{w,d) = '^i^si'^i'^i ~ '^Y/'ii'^i under the constraint 
A^~^ ^i^gWiXi = tx- Denote by a* the transpose of a, the solution of this minimization 
problem is given by 

y ~ y ~^ (^^ ~ ) -^5 

where B = [J2ies lAxixl] ^ J2ies lAyiXi. Note that this is a generalized regression esti- 
mator. It is natural to consider alternative measures, which are given in We first point 
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out that the existence of a solution to the constrained minimization problem depends on 
the choice of the dissimilarities. Then, different choices can lead to weights with different 
behaviors, different ranges of values for the weights that may be found unacceptable by the 
users. We propose an approach where dissimilarities have a probabilistic interpretation. 
This highlights the properties of the resulting estimators. 



Consider the problem of recovering an unknown measure /i on a measurable space 
X under moment conditions. We observe a random sample Ti, ...,T„ ~ \x. For a given 
function x : A* — R'^ and a known quantity tx ■, we aim to estimate // satisfying 



This issue belongs to the class of generalized moment problems with convex constraints 
(we refer to for general references), which can be solved using maximum entropy on the 
mean (MEM). The general idea is to modify the empirical distribution \Xn = Yl^=i 
in order to take into account the additional information on /i given by the moment equation 
([2]). For this, consider weighted versions of the empirical measure n~^Yll=iPi^Ti for 
weights Pi properly chosen. The MEM estimator fin of /i is a weighted version of 
where the weights are the expectation of a random variable P = (Pi, P„), drawn from 
a finite measure u* close to a prior v. This prior distribution conveys the information 
that fin niust be close to the empirical distribution More precisely, let first define 
the relative entropy or KuUback information between two finite measures Q, i? on a space 
(fi. A) by setting 



Since this quantity is not symmetric, we will call it the relative entropy of Q with respect 
to R. Note also that, among the literature in optimization, the relative entropy is often 
defined as the opposite of the entropy defined above, which explains the name of max- 
imum entropy method, while with our notations, we consider the minimum of the entropy. 

Given our prior z/, we now define z/* as the measure minimizing K{., v) under the constraint 
that the linear constraint holds in mean: 



2 Maximum Entropy for Survey Sampling 



2.1 MEM methodology 
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where we set Xj = x(Tj). We then build the MEM estimator fin = n ^ '^^=iPi^T,, where 
P= = K*{P)- 

This method provides an efficient way to estimate some hnear parameter ty = jdfi 
for y : — s> M a given map. The empirical mean y = J^jdfin is an unbiased and 
consistent estimator of ty but may not have the smallest variance in this model. We can 
improve the estimation by considering the MEM estimator ty = Yl^=iPiyi^ which has 
a lower variance than the empirical mean and is asymptotically unbiased (see [7j). 

In many actual situations, the function x is unknown and only an approximation 
to it, say x^, is available. Under regularity conditions, the efficiency properties of the 
MEM estimator built with the approximate constraint have been studied in [TTj and [12] , 
introducing the approximate maximum entropy on the mean method (AMEM). More 
precisely, the AMEM estimate of the weights is defined as the expectation of the variable 
P under the distribution minimizing K{., v) under the approximate constraint 

^.f^[n-^Y.tlP^^^iTi)]=t^. (3) 

It is shown that, under assumptions on x^, the AMEM estimator of ty obtained in this 
way is consistent as n and m tends to infinity. This procedure enables to increase the 
efficiency of a calibration estimator while remaining in a Bayesian framework, as shown 
in Section W% 

2.2 Maximum entropy method for calibration 

Recall that our original problem is to estimate the population mean ty = N^'^ Xliec/ 
based on the observations {yi,i&s} and auxiliary information {xi,i^U}. We introduce 
the following notations: 

y. = nN'^^diHi, Xi = nN'^diXi, Pi = HiWi. 

Note that the variables of interest are rescaled to match the MEM framework. The weights 
{Pi)i£s are now identified with a discrete measure on the sample s. The Horvitz-Thompson 
estimator t^'^ = N^^ Xlies ^iVi ~ Sigs y« preliminary estimator we aim at 

improving. The calibration constraint '^^^ X]iGs^'«-^« ~ stands for the linear condition 
satisfied by the discrete measure {pi)i<zs. So, it appears that the calibration problem 
follows the pattern of maximum entropy on the mean. Let z/ be a prior distribution on 
the vector of the weights (pi)igs. The solution p = {j)i)i^s is the expectation of the random 
vector P = {'n'iWi)i(zs drawn from a posterior distribution u*, defined as the minimizer 
of the Kullback information K{.,i>) under the condition that the calibration constraint 
holds in mean 

[n-^ EiesPi^i] = [N-^ Eigs ^i^i] = t-- 
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We take the solution p = K^*{P) and define the corresponding MEM estimator ty as 

ids iSis 

where we set Wi = dipi for all i G s. Under the following assumptions, we will show 
in Theorem 12.11 that maximum entropy on the mean gives a Bayesian interpretation of 
calibration methods. 

The random weights Pi,i E s (and therefore the Wi,i G s) are taken independent and 
we denote by z/j the prior distribution of Pi. It follows that u = ®igsZ/j. Moreover, all 
prior distributions z/j are integrable with mean 1. This last assumption conveys that pi 
must be close to 1, equivalently, Wi = dipi must be close to the Horvitz-Thompson weight 
di. 

Let : M M be a closed convex map, the convex conjugate (p* of (f is defined as 

Vs G M, (p*{s) = sup(st - ip{t)). 

For u a probability measure on M, we denote by the log-Laplace transform of u: 

A^{s) = log j e'^dz/(x), s G M. 

Its convex conjugate A* is the Cramer transform of u. Moreover, denote by 5*,^ the interior 
of the convex hull of the support of u and let D^u) = {s G M : Ai,{s) < oo}. In the sequel, 
we will always assume that A,y. is essentially smooth (see [H]) for all i, strictly convex 
and that i>i is not concentrated on a single point. The last assumption means that if 
Di^Ui) = {—oo;ai), («« < -l-C)o), then A'^,(s) goes to +oo whenever ctj < +oo and s goes 
to Oj. Notice that, under these assumptions, A'^. is an increasing bijection between the 
interior of D{ui) and S^^- Moreover, we have the functional equalities (A*/)~^ = A^, and 

Definition : We say that the optimization problem is feasible if there exists a vector 
S = {Si)ies G ^iesSu, such that: 

Under the last assumptions, the following proposition claims that the solutions {wi)i^s 
are easily tractable. 

Theorem 2.1 (survey sampling as MEM procedure) Assume that the optimiza- 
tion problem is feasible. The MEM estimator w = {wi, ...,Wn) minimizes over 

{Wi,...,Wn)^ ^Al^ilTiWi) 

under the constraint Tli^s'^^i^i ~ 
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Hence, we point out that maximum entropy on the mean method leads to cahbration 
estimation, where the dissimilarity is determined by the Cramer transforms A* . , i G s of 
the prior distributions Vi. 

Remark : (relationship with Bregman divergences) Taking the priors z/j in a cer- 
tain class of measures may lead to specific dissimilarities known as Bregman divergences. 
We refer to {9] for a definition. In the MEM method, there are two different kinds of 
priors for which the resulting dissimilarity may be seen as a Bregman divergence. Let 
1/ be a probability measure with mean 1 and such that Aj, is a strictly convex function. 
Then, A* enables to define a Bregman divergence. It will play the role of the dissimilarity 
resulting from the MEM procedure in the two following situations. 

First, consider priors z/j,z G s all taken equal to v. It is a simple calculation to see that 
the assumptions made on v imply that A*(l) = A*'(l) = 0. The resulting dissimilarity 
can thus be written as 

V{w,d) = Y,K{'r^^w^) = [A:(7r,w,) - A:(1) - A:'(l)(7r,u;, - 1)] . 

ids ids 

Here, we recognize the expression of the Bregman divergence between the weights 
{vTjWj, i G s} and 1 associated to the convex function A*. 

Another possibility is to take prior distributions z/, lying in some suitable exponential 
family. More precisely, define the prior distributions as 

Vz G s, Vx G X , duii^x) = exp(ajX + l3i)dh'{dix), 

where Pi = —A^{Al'{di)) and a, = dil\*j{di) are properly chosen so that Ui is a probability 
measure with mean 1. Here we recover after some computation the following dissimilarity 

v{w, d) = Y, [a:k) - Kid^) - A:'(rf,)K - d,)] , 

which is the Bregman divergence between w and d associated to A*. 

2.3 Bayesian interpretation of calibration using MEM 

In a classical presentation, calibration methods heavily rely on a distance choice. 
Here, this choice corresponds to different prior measures (z/j)jgs. We now see the 
probabilistic interpretation of some commonly used distances. 

Stochastic interpretation of some usual calibrated survey sampling estimators 

1. Generalized Gaussian prior. 

For a given positive sequence G s, let Wi having a Gaussian distributions 
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J\f{di, diQi) which corresponds to z/j 



Af{l,rriqi). We get 




+ t: KM) 



27iiqi 



The cahbrated weights in that cases minimize the criterion 



{TTiWi - 1)2 



So, we recover the distance discussed in Section [TJ This is one of the main 
distance used in survey samphng. The choice of the qi can be seen as the choice of 
the variance of the Gaussian prior. The larger the variance, the less stress is laid on 
the distance between the weights and the original Horvitz- Thompson weights. 

2. Exponential prior. 

We take a unique prior u with an exponential distribution with parameter 1. That 
is, u = z/*^". We have in that case 



We here recognize the Bregman divergence between {7iiWi)i(zs and 1 associated to 
A*, as explained in the previous remark. A direct calculation shows that this is also 
the Bregman divergence between w and d associated to A*. The two distances are 
the same in that case. 

3. Poisson prior. 

If we choose for prior Ui = i^, Vz G s, where u is the Poisson distribution with 
parameter 1, then we obtain 



So we recover the KuUback information where {niWi)i^s is identified with a discrete 
measures on s. 



Wt e M% A*(t) = - logt + t-l. 



This corresponds to the following dissimilarity 




VteM;, A*(t) =tlogt-t + l. 



So we have the following contrast 
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MEM leads to a classical calibration problem where the solution is defined as a minimizer 
of a convex function subject to linear constraints. The following result gives another 
expression of the solution for which the computation may be easier in practical cases. 

Proposition 2.2 Assume that the optimization problem is feasible, the MEM estimator 
w is given by: 

Vz es, w^ = dik'^^iXdiXi) (4) 
where A minimizes over R'^ A i-* Ylies ^vX^^diXi) — X^t^- 

We endow y with new weights obtaining the MEM estimator ty = N~'^ ^jg^ Wiyi. We 
point out that calibration using maximum entropy framework turns into a general convex 
optimization program, which can be easily solved. Indeed, computing the new weights 
Wi, i E s, only involves a two step procedure. First, we find the unique A e M'^ such that 

J2 dik',^(X'diXi)Xi - t, = 0. 

This is achieved optimizing a scalar convex function. Then, compute the new weights 
Wi = diK'^^{XHiXi). 

2.4 Extension to generalized calibration and instrument estima- 
tion 

Proposition [212] shows that a calibration estimator is defined using a family of functions 
A[,^,z G s satisfying the property that the equation N~^Ylii(zsdi^'y^{X^diXi)xi = tx has a 
unique solution. A natural generalization, known as generalized calibration (GC) (see 
[T6]). consists in replacing the functions A i— > K'j^.^X^diXi) by more general functions fi : 
M'^ ^ M, i G s. Assume that the equation 

F(A)=Ar-i^d,/,(A)x, = 

ids 

has a unique solution A. Assume also that the fi are continuously differentiable at 0, and 
are such that /i(0) = 1 so that -F(O) = t^^ . Then, take as the solution to the generalized 
calibration procedure, the weights: 

Vz G s, Wi = difi{X). 

Calibration is of course a particular example of generalized calibration where we set 
fi : X^ Al^{X^diXi) to recover a calibration problem seen in Section Even though the 
method enables a large choice of functions fi, most cases can not be given a probabilistic 
interpretation. 
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However, an interesting particular choice is given by the functions A i-^ 1 + zjX for 
Zi,i G s. This sequence of vectors of is called instruments (see jlBj). If the matrix 
Xn ■= J2i£sdiZixl is invertible, then, the resulting estimator ty, referred to as the 
instrument estimator obtained with the instruments Zi, is given by: 

iy = tf ^ + (t. - iryX-^N-^ Z.es d^zw^. (5) 

Remark : (dimension reduction) The estimator ty defined in ([5]) can be viewed as 
the instrument estimator obtained with auxiliary variable and instruments B^Zi, i & s 
with B = diZixj] J^iesdiViZi- Hence, in the frame of instrument estimation, the 

original fc-dimensional calibration constraint can be replaced by a one-dimensional linearly 
modified one A^~^ J2i£U '^i^^^i = B^^xi without changing the value of the estimator. This 
enables to reduce the dimension of the problem. Furthermore, it gives an interesting 
interpretation of the underlying process of calibration. For instance, take the instruments 
Zi = Xi,i E s. The corresponding variable B^x is the quadratic projection of y onto 
the linear space E^, spanned by the components of x. In other words, B^x is a linear 
approximation of y. As a result, the variable y — B^x has a lower variance than y, while 
its mean over the population ^ is known up to ty. So, the variable y — B^x can be used 
to estimate ty and will provide a more efficient estimator. Since B is unknown, we use B 
to estimate it. Set y = y — B^x, we have: 

iy- B% = N-^J^diVi- 

The calibrated estimator ty appears as the Horvitz-Thompson estimator (up to a known 
additive constant, here IB^t^) of a variable y with a lower variance than y. This points out 
that calibration relies on linear regression, since an estimator of ty is computed by first 
constructing a linear projection B^x of y on a subspace E^. Reducing the dimension of 
the problem is made by choosing the proper real-valued auxiliary variable, and therefore, 
the proper one-dimensional linear subspace on which y is projected. 

Note also that the accuracy of the estimator heavily relies on the linear corre- 
lation between y and the auxiliary variable. It appears that the accuracy could be 
improved for some non-linear transformation, say u{x), of the original auxiliary vari- 
able X, provided that y is more correlated with u{x) than x. This is discussed in Section |3l 

Instrument estimators play a crucial role when studying the asymptotic properties 
of generalized calibration estimation. A classical asymptotic framework in calibration 
is to consider that n and simultaneously go to infinity while the Horvitz-Thompson 
estimators t^"^ and ty'^ converge at a rate of convergence of ^/n, as described in [2] and 
[19] for instance. This will be our framework here. That is 

-tx\\= O^n-'/') and (£f ^ - ty) = 0^{n-'/'). 
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In this framework, all GC estimators are i/n-consistent, as seen in [2]. 

Definition We say that two GC estimators ty and ty are asymptotically equivalent if 

(t;-t;) = op(n-v2). 

Proposition 2.3 Let ty and ty be the GC estimators obtained respectively with the func- 
tions fi,i & s and gi,i G s. If for all i ^ s, V/i(0) = Vgi{0) = Zi, and if the matrix 
Xn '■= N^^ XliGs diZixj converges toward an invertible matrix X, then ty and ty are asymp- 
totically equivalent. In particular, two MEM estimators are asymptotically equivalent as 
soon as their prior distributions have the same respective variances. 

This proposition is a consequence of Result 3 in [2J. It states that for all GC estimator, 
there exists an instrument estimator having the same asymptotic behavior, built by taking 
as instruments the gradient vectors of the criterion functions at 0: Zi = V/i(0),z G s. 
Consequently, a MEM estimator ty built with prior distributions Uiji&s with mean 1 and 
respective variances ntqi for (gi)j(=s a given positive sequence, satisfies 

where B = [^i^^ diqiXixl\ ^ ^i^gdiqiXiyi. The negligible term Of{n~^/'^) is zero for all 
n for Gaussian priors Vi ~ A/'(l, vTjgj), which stresses the important role played by the 
corresponding dissimilarity (see Example 1 in Section [2131) . Note also that the Gaussian 
equivalent ty = ty'^ + [t^ — t^'^YB is the instrument estimator built with the instruments 
Zi = qiXi- This choice of instruments, and in particular the case qi = 1 for all z G s, is 
often used in practice due to its simplicity and good consistency. 

3 Efficiency of calibration estimator with MEM 
method 

By using the auxiliary variable x in the calibration constraint, we implicitly assume 
that X and y are linearly related. However, other relationships may prevail between the 
variables and it may be more accurate to consider some other auxiliary variable u{x). Here, 
we discuss optimal choices of function u : X —^W^ io use in the calibration constraint. To 
do so, we first define a notion of asymptotic efficiency in our model with fixed auxiliary 
variable u{x). Then, we study the infiuence of the choice of the constraint function u 
and find the optimal choice leading to the most efficient estimator. Finally, we propose 
a method based on the approximate maximum entropy on the mean which enables to 
compute an asymptotically optimal estimate of ty, taking into consideration both the 
choice of the constraint function u and the instruments Zi. 
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3.1 Asymptotic efficiency 

In order to choose between cahbration estimators, we now define a notion of asymp- 
totic efficiency for a given calibration constraint. Although a GC estimator is entirely 
determined by a family /j, i e s of functions, only the values Zi — V/i(0), i e s matter to 
study the asymptotic behavior of the estimator, up to a negligible term of order op(n~^/^). 
Let w : A" — > R'^ be a given function, and consider: 

tu = N~^^u{xi), iuT, = ^ diu{xi). 
ieu ies 

We make the following assumptions. 

Al: C, '■= {{xi,yi),i G U} arc independent realizations of {X,Y), with E(y|X) ^ 

and E(|F'^|) < oo. Note respectively Px and Pxy the distributions of X and {X, Y). 

A2: The sampling design p{.) does not depend on ^. 

A3: n and N/n tend to infinity. This will be denoted by (n, N/n) — > oo. 

Furthermore, u is assumed to be measurable and such that E(||m(X)^||) < oo. Given the 
constraint function u and instruments Zi,i e s, we note ty{u) the resulting instrument 
estimator, the dependency in Zi is dropped for ease of notation. Wc now study the 
asymptotic behavior of ty{u) with respect to the instruments Zi,i G s. Here, the weights 
w are adapted to the new calibration constraint Siecf ~ yielding 

iy{u) = J2 ^iVi = + - tunfK 

ieu 

where Bu = \J2ies diZiu{xiy] ^ ^jg^ diUiZi is assumed to be well defined and to converge 
in probability towards a constant vector Bu as {n, N/n) — > oo. 

In order to define a criterion of efficiency, we first need to construct an asymptotic 
variance lower bound for instrument estimators. Note E^(ty — the quadratic risk of 

ty{u) under p, the population ^ being fixed, we aim to determine a lower bound for the limit 
of vM^{ty—iy{u)y as (n, N/n) — > oo (provided that the hmit exists). The value of the limit 
of course heavily relies on the asymptotic behavior of the sampling design. Without some 
control on the Horvitz-Thompson weights tTj, we can not derive consistency properties for 
instrument estimators. Note Tr^ = ij^gP{s) the joint inclusion probability of i and j 
and let Aij = -Kijdidj — 1, wc make the following technical assumptions. 

A4: T.^eu^l = o{N'n-^), EieuEj^i^j = o{N^n-'). 

A5: lim nN'^j:^^^ A,, ^ - lim nA^-^ 5^^^^ 5^ .^^ A,, = 1. 

N/n— *^oo N/rt—*oo 
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Assumption 4 is sufficient to ensure that the HT estimator of some variable a{xi,yi),i G U 
is A/ri-consistent provided that K{a{X, Y)"^) < oo. Furthermore, Assumption 5 ensures 
the existence of its asymptotic variance. Note that these assumptions do not take into 
consideration the population ^, so that it makes them easy to check in practical cases. 
For example, the assumptions are fulffiled for the uniform sampling design, that is when 
p is such that every sample s <Z U has the same probability of being observed. In that 
case, the Horvitz-Thompson weights are tTj = n/N and vTjj = n{n — 1)/N{N — 
yielding An = N/ n — 1 and Aij = —{N — n)/n{N — 1). We can now state our ffist result. 

Lemma 1: Suppose that Assumptions 1 to 4 hold. Then, 

nE^{ty - iy{u)f > var {Y - 5*u(X)) + op(l), 
with equality if, and only if. Assumption 5 also holds. 

We point out that an asymptotic lower bound for the variance can be defined for 
instrument estimators as soon as Assumptions 1 to 4 hold. The lower bound (denoted by 
V*{u)) is the minimum of var(y — B^u{X)) for B ranging over R"^. It can be computed 
explicitly if the matrix var('u(X)) is invertible: 

V*{u) = var {Y - cov(y', m(X))* [var(M(X))]"^ m(X)) . 

We say that an estimator iy(u) is asymptotically efficient if its asymptotic variance is 
V*{u). Note that this lower bound can not be reached if Assumption 5 is not true. We 
now come to our second result. 

Lemma 2: Suppose that Assumptions 1 to 5 hold. If vaT{u{X)) is invertible, iy{u) built 
with instrument Zi,i ^ s is asymptotically efficient if, and only if, 

, Jim [^.g,rfiZiM(a;i)*]~^^.g^rfii/iZi = [var(M(X))]"^cov(F,M(X)). (6) 

In an asymptotic concern and when the calibration function u is fixed, finding the 
best instruments Zi,i&s in order to estimate ty becomes a simple optimization problem 

which depends only on the limit Bu of Bu = \J2ies ^i^i'^i-'^iY] ^ JZies^iVi^i- Asymptotic 
efficiency is obtained by choosing instruments minimizing the asymptotic variance. 
Hence, calculating Bu provides an efficient and easy way to prove the asymptotic 
efficiency of an instrument estimator. Moreover, this criterion of asymptotic efficiency 
can be extended to the set of all generalized calibration estimators, as a consequence 
of Proposition 12.31 A GC estimator defined by the functions G s is asymptotically 
efficient if and only if the vectors Zi = Vfi{0),iEs satisfy 1^. 
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Proof of Lemmas 1 and 2: First compute the quadratic risk of ty{u). Due to its non 
linearity it is a difficult task. We rather consider its linear asymptotic expansion ty^n^^u) : = 
y'^ + [tu — tun)Bu where we recall that Bu is the limit (in probability) of Bu- Note that 
the random effect is due to the sampling design p, the population ^ is fixed. We obtain 
after calculation the following expression for the quadratic risk 

i,jeu 

Then, the results follow directly from Lemma 15.11 given in the Appendix. | 
We now see some examples of well-used estimators. 

Asymptotic variance of some GC estimators 

1. Optimal instruments. 

Assume for sake of simplicity that u is real-valued. We denote by the value of 
Bu achieving the minimal value of the quadratic risk: 



The corresponding instruments are Zi = J^jeV^ij u{xj),^i. By Lemma ISAl we see 
that i?™*^ converges toward cov(y, M(X))/var(u(X)) as {n,N/n) oo, Equation 
([6]) is thus true in that case. If the sampling design is uniform, we obtain after 
calculation Zi = ^^[^rjy('w(a;j) — and we have: 

T^ieUyi^i _ COVeiy,u{x)) 



u 



^i(,uZiu{xi) vaTeiu{x)) 



where coVe and varg denote the empirical covariance and variance for the pop- 
ulation ^ given by coVe(?/, = N'^ Y..^^ yi{u{xi) - t„) and vare(u(x)) = 
coye{u{x),u{x)). Finally, 

nE^ity - iy,,^f = (1 - nN-') var, (y - ^ ^^^^^ 



vare('u(x)) 

We have \im(^n,N/n)^QonE^(ty — ty,im)^ = V*{u), as expected. This estimator is thus 
asymptotically efficient. Although, instruments used for its computation depend on 
the whole population and therefore, they may be computationally expensive. 



2. MEM estimators. 

Take the instruments Zi = qiu{xi),Wi G s for {qi)ies a positive sequence. As seen in 
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Section these instruments describe the asymptotic behavior of MEM estimators 
built using prior distributions Vi with respective variances vTjgj. Even though this 
choice is often used in practical cases, we see that it does not necessarily lead to an 
asymptotically efficient estimator ty{u). Indeed, under regularity conditions on q-i 
which ensure the convergence of Bu (basically, the assumptions of Proposition 13.11 
which are true for instance if we take = 1), we have: 



This is true when u{.) = E(Y\X = .) or for any u such that E(u(X)) = 0, MEM 
estimators are thus asymptotically efficient in these cases. When this condition is 
not fulfilled, an easy method to compute an efficient estimator consists in adding 
the constant variable 1 in the calibration constraint. We then consider the MEM 
estimator ty{v) where v = (1,^)* : X — > M'^"'""^, the calibrated weights now satisfy 
the constraints 



Here, the matrix var(f (X)) is not invertible although we see after a direct calculation 
that V*{v) = V*{u). So, the auxiliary variable is modified but the asymptotic lower 
bound is unchanged. Furthermore, the MEM estimator ty{v) obtained in this way 
is asymptotically efficient, as it is proved in the following proposition. 

Proposition 3.1 Suppose that Assumptions 1 to 5 hold. Let (z/j)jgs be a family of prob- 
ability measures with mean 1 and respective variance qiiii with {qi)i(zs a given positive se- 
quence. Assume that there exists k, = K,{n, N) G M such that k, X]jes Qidi is bounded away 
from zero and h? '^i^zgilidi)'^ ^ as (n, N/n) +oo. Let v = (1, Vi, Vd) : X ^ R'^+^ 
be a map, where l,vi,...,Vd are linearly independent. Then, the MEM estimator built 
with prior distribution v = <S)i<£s^i and calibration constraint J^i&s'^^i'^i^i) ~ '^^ 
asymptotically efficient. 

3.2 Approximate Maximum Entropy on the Mean 

We now turn on the optimal choice of the auxiliary variable u{x) defining the cali- 
bration constraint. For a given constraint function u, we implicitly take asymptotically 
optimal instruments Zi,i&s, that is, instruments such that the resulting estimator ty{u) 
has asymptotic variance V*{u). Hence, minimizing the asymptotic variance of GC esti- 
mators with respect to u and {zi)i(zs reduces to minimizing V*{u) with respect to u. 



These instruments satisfy Equation ([6]) only if 



[E{u{X)u{Xy)] E(Yu{X)) = [var(n(X)] 



-1 



cov(Y,u{X)). 
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In an asymptotic framework, u can be taken with values in M without loss of generality, 
as discussed in Section 1231 So, for a real valued constraint function u, V*{u) is defined 
as: 

V*(u) = inf var(F - Bu(X)) = var f F - ^^^^^^^^^^^^J^u(X)] . 
Bm \ var(-u(X)) / 

A function v for which V*{v) is minimal over the set ax of all real X- measurable functions 
has the form v{.) = aK(Y\X = .) + f3 for some G M* x M. Hence, the conditional 

expectation $(x) = E(y|X = x) (or any bijective affine transformation of it) turns out 
to be the best choice for the auxiliary variable in term of asymptotic efficiency. In that 
case, the asymptotic lower bound is given by: 

V* = mmV*{u) =E{Y -E{Y\X)y. 

ud ax 

For practical applications, this result is useless since the conditional expectation 
$ depends on the unknown distribution of (X, F). If $ were known, the problem 
of estimating ty would be easier since the observed value t$ = Ylii(^u is a 

\/iV-consistent estimator of ty and is therefore much more efficient than any calibrated 
estimator. When the conditional expectation $ is unknown, a natural solution is to 
replace $ by an estimate and then plug it into the calibration constraint. Under 
regularity conditions that will be made precise later, we show that this approach enables 
to compute an asymptotically optimal estimator of ty, in the sense that its asymptotic 
variance is equal to the lower bound V* defined above. 

For all measurable function m, we now denote by iy{u) the MEM estimator of ty ob- 
tained with prior distributions Vi ~ A/'(l, vTj) and auxiliary variables u{x) and 1. We recall 
that ty{u) is \/ri-consistent with asymptotic variance V*{u), as shown in Proposition 13. 1[ 
Moreover, we know that the asymptotic variance of MEM estimators ty{u) is minimal for 
the unknown value u = $. The AMEM procedure consists in replacing $ by its approx- 
imation $m in the calibration constraint. The so-obtained AMEM estimator is 
thus quite easily computable but still verifies interesting convergence properties as shown 
in the next proposition. 

Proposition 3.2 Suppose that Assumptions 1 to 5 hold. Let ($m)meN be a sequence of 
functions independent with ^ and such that 

E(<I>(X) - $™(X))^ = Oiip;^^) with lim ip^ = +00. 

m^oo 

Then, the AMEM estimator iy{^rn) is asymptotically optimal among all GC estimators 
in the sense that raEg(ty — converges toward V* as n,N/n,m —>■ oo. 
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When applied to this context, approximate maximum entropy on the mean enables to 
increase the efficiency of calibration estimators when an additional information is avail- 
able, namely, an external estimate of the conditional expectation function $ is observed. 
Nevertheless, in our model, it is possible to obtain similar properties under weaker con- 
ditions. 

Corollary 3.3 Suppose that Assumptions 1 to 5 hold. Let ($m)mGN a sequence of 
functions satisfying 

i) n¥.^(t^^-t^-{U^^-t^Jf and ii) B^^ 1. 

(n,A'/n,m)— >oo (n,N/n,m)— too 

Then, the estimator tyi^^^) is asymptotically efficient. 

This corollary does not rule out that the functions $m are estimated using the data, 
which was not the case in Proposition 13.21 Hence, it becomes possible to compute an 
asymptotically efficient estimator of ty without external estimator $m of $. A data 
driven estimator provides as well an asymptotically efficient estimator of ty, as soon 
as the two conditions of Corollary 13.31 are fulffiled. 

Now consider an example of AMEM estimator for which the computation is particu- 
larly simple, and that provides interesting interpretations. We assume for simplicity that 
the sampling design is uniform, here ty^ is simply equal to A^~^ JZiesVi- Let {(j)^, (f)'^, ...) 
be a linearly independent total family of L^(Px)- That is, for all measurable function 
/ : — > R such that E{f{XY) < oo, there exists a unique sequence {an)nm such that 

/(X) = E(/(x)) + a0{x) - m^m- 

im 

For all m, the projection $m of $ on vect {1, 0^, 0™} is given by 

= E(F) + cov(F, <Pm{X)f [var(0„(X))]-^ - E(0^(X))] 

where 0^ = (0^, ...,0'")*. When n is large enough in comparison to m, we can define a 
natural projection estimator $rra,n of ^ as 

where 50,„=[Eies ViiM^i) - h^.^)] * [Eies M^diM^i) - h,^^Y] 
We now consider the AMEM estimator t($m,n): 

ty[^m,n)-ty + (x)-L y""-- ^'"'""^ 
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which, after simphfication, gives 



The objective is to find a path (m(n), 'ri)„gN for which the estimator := $rn.(n),n satisfies 
the conditions of Corollary 13.31 We know that, for all m: 

= nN-^ E^,,eu - Bl^MxMHx,) - + op(l) 

where B^^ = lim(„,Ar/„)_oo = cov(Y, ^^(X))* [var(0„(X))]"\ By LemmaEJl we get: 
Vm, nE^iU^ -U- (4^ - U^jy var($(X) - $™(X)). 

Since the convergence is true for all m, we can extract a sequence of integers (m(n))„gN 
such that $„ := $m(n),n undergoes the first condition of Corollary 13.31 

-S 2 ^ 

nEg(t$^ — t$ — (t<i>„7r — — > 0. 

The second condition of Corollary 13.31 is verified for such a sequence ($n)nGN since for all 
n, = 1. So finally we conclude that the AMEM estimator is asymptotically 

optimal. 

Remark : The AMEM estimator is obtained by plugging an estimator $„ of $ in the 
calibration constraint. Note that ty($n) is the MEM estimator we obtain with constraint 
function Indeed, iy{^n) = + ^L(n) (^-/-^c.) -hm^n)- This is a consequence 

of the dimension reduction property relative to instrument estimators discussed in Section 
12.41 is an affine approximation of y by the components of 4>m{n){x)- By increasing 
properly the number of constraints, the projection will converge toward the conditional 
expectation yielding an efficient estimator of ty. 

We can also rewrite the estimator as ty($n) = 't<s>n- In these settings, we can interpret the 
AMEM procedure as building an estimator of t$ instead of estimating ty. Because ^(x) is 
not a function of y, it can be estimated by the empirical mean over the whole population 
U. An estimator of t$ will asymptotically yield an estimate of ty as a consequence of the 
relation E(E(F|X)) = E{Y). 

4 Numerical simulations 

We shall now give some numerical applications of our results. We made a simulation 
of a population U of size N = 100000, where X is a uniform variable on the interval 
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[1;2], and we take Y = exp(X) + e with e ~ Af{0,a^) an independent noise. So, the 
conditional expectation $ mentioned in the last section is simply the function exp(.). 
The sampling design is uniform and the sample s is taken of size 121. We consider 
six instruments estimators, ti to tg, of which we make 50 realizations observed from 50 
different samples drawn from the fixed population U, and we give for i = 1,...,6 an 
estimator Vi of the variance calculated from the 50 observations. The first estimator 
considered ti is the Horvitz-Thompson estimator, and the last one is the AMEM 
estimator taken as example in Section 13.21 where we took the family {X* : i G N} for 
the base of L^(Px), and we set the number m of constraint functions to m = 6. The 
construction of the estimators are detailed in the following table. The results are given 
for two different values of cx^, namely = 1 and = 0.1. 



1. e ~ Ar(0,l): 





auxiliary variable 


instrument 


estimated variance 


ti (H-T estimator) 


none 


none 


VI = 2.07 X 10-2 




X 




V2 = 7.8 X 10-3 


h 


X = (1, x) 


(^i)ies 


V3 = 7.6 X 10-3 


U 


exp(x) 


{exp{xi))i(zs 


Vi = 7.2 X 10-3 


h 


X = (1, exp(a;)) 




V5 = 6.9 X 10-3 


4 (AMEM estimator) 




(Xj) jgs 


V6 = 7.2 X 10-3 



We observe that the calibrated estimators appear to be better than the Horvitz-Thompson 
estimator. The choice of the auxiliary variable or the instrument does not seem to have 
a significant effect on the efficiency. 

2. e ~7V(0,0.1): 





auxiliary variable 


instrument 


estimated variance 


ii (H-T estimator) 


none 


none 


VI = 1.93 X 10-2 




X 




V2 = 3.1 X 10-3 




X = (1, x) 


(xi)jes 


V3 = 8.7 X 10-^ 




exp(x) 


(exp(xi))ig^ 


VA = 6.8 X 10-^ 


h 


X = (1, exp(x)) 


(xj)ies 


V5 = 6.7 X 10-4 


U (AMEM estimator) 




(^i) ies 


VQ = 7.0 X 10-4 



Here, X explains almost entirely Y since the variance of e is low (cr^ = 0.1). In that case, 
the choice of the auxiliary variable and instrument appears to play a more important role. 
We notice a significant difference between t2 and which points out the importance of the 
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instrument. More specifically, we see that the instrument {xi—t^^)i^s (which is equivalent 
to adding the constant 1 as an auxiliary variable) provides a better estimator than Xi. 
Furthermore, also note that using the auxiliary variable = exp(x) provides the best 
estimator in term of minimal variance as we see that V4 and V5 are the smallest estimated 
variances. These estimators can be viewed as oracles, since the auxiliary variable used in 
that case is the optimal choice, but is in general unknown (see Section ing . The difference 
between and is not significant, as expected, according to the second example of 
Section 13.11 Finally, the AMEM estimator has its variance lying between that of the 
standard calibrated estimator and that of the oracles, which conveys that it is more 
efficient than ^3. 

5 Appendix 

5.1 Technical lemma 

Lemma 5.1 Let T he the set of all functions / : (M^ x M) ^ R such that E(|/(X, Y)f) 
is finite (we set fi = f{xi,yi) for all i G U). Under Assumptions 1, 2 and 4, 

V/ G ^, ^ A,, fjj > var(/(X, Y)) + op(l) 

i,jeu 

as {n,N/n) — ^ 00, with equality if and only if Assumption 5 also holds. In that case, the 
quantity nN^'^Ylij^u fidj converges in probability toward cov{f{X,Y),g{X,Y)) for 
all f,gEJ-'as (n, N/n) — > 00. 

Proof of Lemma I5.lt 

Assumptions 1, 2 and 4 yield for all f ^ T: 

= {nN-' E.eu A..) E(/(X, Yf) + (nAr-2 ^^^^ j ^(/(X, Y))^ + op(l) 

Let VniU) denote the set of all subsample s oiU with n elements. By Jensen inequality, 
we get 

2 



-N^>0 



which implies that '^^^j ^ij > - Ylii&^a- Thus: 

nN-^ E,,e^ A.. > [nN'^ E.ec/ A..) var(/(X, Y)) + op(l). 

Since 'Ylii^u ~ '^•> know that nN""^ 'Yli^^u Am ^ 1 ~ nN^^ by convexity of a; 1— > 1/x 
on Hence 

^A^''E„wAi, > var(/(X,r)) + op(l). 
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as {n, N/n) — > oo. Furthermore, it is not an equality for all / G if Assumption 5 is not 
true. We show the second part of the lemma using the same pattern as in the beginning 
of the proof applied to / and g. In particular, it holds when f = g. 

5.2 Proofs 

Proof of Theorem I2.lt 

For all w G M", let fw : M" IR+ be the unique minimizer of the functional / ^ K{fv, v) 
on the set JF^ = {/ : /jj„(t — 7rw)/(r)(iz/(r) = O}. We have: 

= argmin /(log(/) - l)rfz/. 

We calculate the Lagrangian £(A, /) associated to the problem: 

C{\ f) = 4„[/(r) log(/(r)) - f{T)]du{r) - A* 4„(r - nw)f{T)du{T) 
where A G M" is the Lagrange multiplier. The first order conditions are: 

Vr G M'^, log(/(r)) = A*(r - ttw). 
Hence, Vr, /^(r) = e'^™*^'^^'^'"^ where verifies: 

J^„{t - 7r^i;)e^'(^-™)dz/(r) = A^ = argmin J^„ e^'(^-™)c/z/(r) 

Let S = {{wi)i^s '■ J2ies ^i'^i = ^x}, we notice that 

w = E^,{W) = argmin {min/e^^ J^„ /(log(/) - l)du} 

w€S 

= argmin {J^„ f^{\og{f^) - l)du} 

= argmin {A^ 4„(r - 7rw)e^-(^-™)rfz/(r) - e^-(^-™)cii^(r)} 
= argmin { -miuAGRn e"^*™ /jg„ e^'^(iz/(r)} . 

by definition of A^,. Recall that z/ = <^i^s^i. Since the function t i— > — logt is decreasing, 
we have that 

min {e-^*™ e^'^duir)} = exp -sup {Eiesl^^^*^* - ^og J^e^^^^du^ir^)]} 



The supremum being taken for A G M", we see that 

sup {T.iesl^iT^i'^i -'^og J^e^'^'duiiTi)]} = ^.^^ sup {XiiTiWi - log J^e^'^^duiin)} 

AgR" AiGR 

Finally we obtain: 

w = argmin - exp (- ^.^^ Al^iniWi)) = argmin ^.^^ A*^(7riU;i). 

wi^S w€S 
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Proof of Proposition 12. 2t 

It is a classic convex optimization problem. Let L be the Lagrangian associated to the 
problem: 

^) = E.es A;(^.vr.) - A* (E.es ^^^^ - Nt.) 
where A e M'^ is the Lagrange multiplier. The solutions to the first order conditions satisfy 
for all i G s, 

Wi = di{Al'y\X'diXi), 

where we recall that the functions A*, are assumed to be strictly convex, so that (A*/)~^ 
exists for all i, and is equal to A^.. Now it suffices to apply the solutions of the first order 
conditions to the constraint to obtain an expression of the solution A: 

The equivalence is justified by the fact that Ky^ is strictly convex, and therefore, so is 
A 1-^ Yliies^i'ii'^^di^i) ~ ^^'tx- For that reason, A is uniquely defined. We finally obtain an 
expression of the calibrated weights 

Vi G s, i&i = (ijA'^.(A*(ijXi). 

Proof of Proposition 12. 3t 

Let F : A 1-^ N^^ Sies difi{X)xi, and G : \ N^'^ J2ies digi{X)Xi. We call respectively A 
and A the solutions to F{X) = and G{\) = t^. We have 

F(A)=F(0)+X„A + o(l|A||) 

and then [tx — i^'^) = X„A + o(||A||). By assumption, X„ is invertible for large values of 
n since it converges towards an invertible matrix X. Thus, whenever t^^ is close enough 
to tx, there exists Aq in a neighborhood of such that -F(Ao) = t^. By uniqueness of the 
solution, we conclude that Aq = A. Hence, for large values of n, 

A similar reasoning for A yields ||A — A|| = op(ri^^/^). Thus, A and A converge toward 
and by Taylor formula: 

/,(A) = 1 + zfX + op(n-i/') = 1 + zfX + op{n-'/^) = g,{X) + op{n-'/^). 

It follows that iy and iy are asymptotically equivalent. 

We know that MEM estimation reduces to taking = A'^.{dixj.) in a GC procedure. 
Hence, in that case, V/j(0) = diA'l.{0)xi. Since the variance of a probability measure 
I'i is given by A".(0), two MEM estimators with prior distributions having the same 
respective variances are asymptotically equivalent. Furthermore, a Gaussian prior z/j ~ 
gjVTj) has a log-Laplace transform A^- : t Hiqit^ /2 + t which corresponds to /i(A) = 
A'^.{dix\X) = 1 + qix\X. The resulting MEM estimator is thus the instrument estimator 
obtained with instruments Zi = qiXi,i G s. 
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Proof of Proposition I3.lt 

We set u = {vi, ...,Vd), the matrix var(n(X)) is invertible. By assumption on {qi)i£s, we 
have 

and 

f^Y.iesd'iQivixi)v{xiy = {KY,^^^diqi)E{v{X)v{Xy) +k op(l). 
Since (k ^jg^ d^qi) is bounded away from zero, it follows that 

Bv = [E^esd^(lMxMxi)T'T.^^sd^qiyiv{xi) ^ [E{v{X)v{Xy)]-'E{Yv{X)) = B,. 

By simple algebra, we show the functional equality B\jV{.) = B\^u{.) + K, where K is 
constant, and therefore does not modify the value of the variance. More precisely, the 
asymptotic variance of ty{v) is 

var(r - cov(y', [var(M(X)]"^ u{X) + K) = V*{u), 

which proves that the MEM estimator iy(v) is asymptotically efficient. 

Proof of Proposition 13. 2t 

We decompose the AMEM estimator as follow 

We have by assumption 

nES'^.-U-{U,^.-Ujf = 0^{^-^^) and {B^^- I) = 0^{v~^^'^) 

as n,N/n oo and uniformly for all m (see the proof of Lemma 1 in [12] )• Hence, the 
terms (t<i.,r- t$ - (t$„7r - ^*m)) and {B^^- t$„7r) are asymptotically negligible 

in comparison to (t$ — as n, N/n, m — oo. We conclude using Result 2 and Lemma 

o 

Proof of Corollary O 

All conditions are fulfilled so that the proof of Proposition 13.21 remains valid in that case. 
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